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[57] 



ABSTRACT 



A computer system comprising a microprocessor archi- 
tecture capable of supporting multiple processors. Data 
transfers between data and instruction caches, I/O de- 
vices, and a memory am handled using a switch net- 
work. Access to memory buses is controlled by arbitra- 
tion circuits which utilize fixed and dynamic priority 
schemes. A test and set bypass circuit is provided for 
preventing a loss of memory bandwidth due to spin- 
locking. A row match comparison circuit is provided 
for reducing memory latency by giving an increased 
priority to successive requests for access to memory 
locations having the same row address. Dynamic 
switch/port arbitration is provided by changing device 
priority based on the intrinsic priority of the device, the 
number of times that a request has been serviced based 
on a row match, the number of times that a device has 
been denied service, and the number of times that a 
device has been serviced. 



35 Claims, 9 Drawing Sheets 
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reduce system bus bandwidth. Separate processors can- 
MICROPROCESSOR ARCHITECTURE WITH A not be allowed to read and write the same data unless 
SWITCH NETWORK FOR DATA TRANSFER precautions are taken to avoid problems with cache 
BETWEEN CACHE, MEMORY PORT, AND IOU coherency. 

CROSS-REFERENCE TO RELATED $ SUMMARY OF THE INVENTION 

APPLICATIONS In view of the foregoing, a principal object of the 

The present application is related to the following present invention is a computer system comprising a 

applications, all assigned to the Assignee of the present microprocessor architecture capable of supporting mul- 

application: *° tipfe heterogenous processors which are coupled to 

1. HIGH-PERFORMANCE RISC MICRO- multiple arrays of memory and a plurality of I/O de- 
PROCESSOR ARCHITECTURE, invented by Le vices by means of one or more I/O buses. The arrays of 
Nguyen et al, application Ser. No. 07/727,006, now memory are grouped into subsystems with interface 
abandoned; circuits known as Memory Array Units or MAU*s. In 

2. EXTENSIBLE RISC MICROPROCESSOR AR- 15 each of the processors there is provided a novel mem- 
CHITECTURE, invented by Quang Trang et al, appli- ory control unit (MCU). Each of the MCU's comprises 
cation Ser. No. 07/727,058, abandoned; a switch network comprising a switch arbitration unit, a 

3. RISC MICROPROCESSOR ARCHITECTURE data cache interface circuit, an instruction cache inter- 
WITH ISOLATED ARCHITECTURAL DEPEN- face circuit, an I/O interface circuit and one or more 
DENCIES, invented by Yoshi Miyayama, application 20 memory port interface circuits known as ports, each of 
Ser. No. 07/726,744, abandoned; said port interface circuits comprising a port arbitration 

4. RISC MICROPROCESSOR ARCHITECTURE unit 

IMPLEMENTING MULTIPLE TYPED REGIS- The switch network is a means of communication 
TER SETS, invented by Sanjiv Garg, application Ser. between a master and a slave device. To the switch, the 
No. 07/726,773, pending; 25 possible master devices are a D-cache, an I-cache, or an 

5. RISC MICROPROCESSOR ARCHITECTURE I/Q controller ^ pou) and the possible slave devices 
IMPLEMENTING FAST TRAP AND EXCEP- ^ a memory port or an IOU. 

TION STATE invented by Quang Trang et al, applica- ^ function of ^ &witch netwQrk is ^ receive ^ 

tion Ser. No. 07/726,942, abandoned; various instructions and data requests from the cache 

^SINGLE CHIP PAGE PRINTER COOTROL- 30 ^ d^che, ^-cache) and the IOU. 

A7/7oio V ,Q nt ^ I a ' application Ser. No. After having received ^ requestS) the switch 

07/726,529, abandoned. tion ^ m the gwitch network md the port arbitration 

BACKGROUND OF THE INVENTION unit in the port interface circuit prioritizes the requests 

1. Field of the Invention 35 and passes them to the appropriate memory port (de- 
The present invention relates to microprocessor ar- P^mg on * e ™tion address). The port, or ports 

chitecture in general and in particular to a microproces- ^ case ^ *7 ^ S enerate 

sor architecture capable of supporting multiple hetero- *ming ^si^ receive or sendee necessary data toA 

geneous microprocessors. If lt ,s a wnte(WR) request, the interac- 

2. Description of the Related Art 40 bon &e port and the switch stops when the 
A computer system comprising a microprocessor ^hh* i pushed all the write data mto the write data 

architecture capable of supporting multiple processors ™™ <WE>F) from the switch. If it is a read (RD) re- 
typically comprises a memory, a memory system bus <l uest > interaction between the switch and the port 
comprising data, address and control signal buses, an only ends when the port has sent the read data back to 
input/output I/O bus comprising data, address and 45 requesting master through the switch, 
control signal buses, a plurality of I/O devices and a The switch network is composed of four sets of tn- 
plurality of microprocessors. The I/O devices may state buses that provide the connection between the 
comprise, for example, a direct memory access (DMA) cache > IOU and the memory ports. The four sets of 
controller-processor, an ethernet chip, and various tri-state buses comprise SW_REQ, SW_WD, 
other I/O devices. The microprocessors may comprise, 50 S W_RD and SWJDBST. In a typical embodiment of 
for example, a plurality of general purpose processors as the present invention, the bus SW_REQ comprises 29 
well as special purpose processors. The processors are wires which is used to send the address, ID and share 
coupled to the memory by means of the memory system signal from a master device to a slave device. The ID is 
bus and to the I/O devices by means of the I/O bus. a tag associated with a memory request so that the 
To enable the processors to access the MAU and the 55 requesting device is able to associate the retiirning data 
I/O devices without conflict, it is necessary to provide with the correct memory address. The share signal is a 
a mechanism which assigns a priority to the processors signal indicating that a memory access is to shared 
and I/O devices. The priority scheme used may be a memory. When the master device is issuing a request to 
fixed priority scheme or a dynamic priority scheme a slave, it is not necessary to send the full 32 bits of 
which allows for changing priorities on the fly as sys- 60 address on the switch. This is because in a multimemory 
tern conditions change, or a combination of both port structure, the switch would have decoded the 
schemes. It is also important to provide in such a mech- address and would have known whether the request 
anism a means for providing ready access to the mem- was for memory port 0, port 1 or the IOU, etc. Since 
ory and the I/O devices by all processors in a manner each port has a pre-defined memory space allotted to it, 
which provides for minimum memory and I/O device 65 there is no need to send the full 32 bits of address on 
latency while at the same time providing for cache SW_REQ. 

coherency. For example, repeated use of the system bus In practice, other request attributes such as, for exam- 
to access semaphores which are denied can significantly pie, a function code and a data width attribute are not 
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sent on the SW„REQ because of timing constraints. If The test and set bypass circuit implements a simple 
the information were to be carried over the switch, it algorithm that prevents a loss of memory bandwidth 
would arrive at the port one phase later than needed, due to spin-locking, i.e. repeated requests for access to 
adding more latency to memory requests. Therefore, the MAU system bus, for a semaphore. When a test 
such request attributes are sent to the port on dedicated 5 instruction is executed on a semaphore which locks a 
wires so that the port can start its state machine earlier region of memory, device or the like, the CAM stores 
and thereby decrease memory latency. the address of the semaphore. This entry in the CAM is 
Referring to FIG. 8, the bus SW_JWD comprises 32 cleared when any processor performs a write to a small 
wires and is used to send the write data from the master region of memory enclosing the semphore. If the re- 
device (D-cache and IOU) to the FIFO at the memory 10 quested semaphore is still resident in the CAM, the 
port. It should be noted that the I-cache reads data only semaphore has not been "released by another processor 
and does not write data. This tri-state bus is "double- and therefore there is no need to actually access mem- 
pumped" which means that a word of data is transferred ory for the semaphore. Instead, a block of logical 1 's 
on each clock phase, reducing the wires needed, and (SFFFF's) (semaphore failed) is sent back to the re- 
thus the circuit costs. WD00, WD01, WD10 and WD11 15 questing cache indicating that the semaphore is still 
are words of data. Since the buses are double-pumped, locked and the semaphore is not actually accessed, thus 
care is taken to insure that there is no bus conflict when saving memory bandwidth. 

the buses turn around and switch from a master to a new A write of anything other than all l's to a semaphore 

master. clears the semaphore. The slave CPU then has to check 
Referring to FIG. 9, the bus SW_RD comprises 64 20 the shared memory bus to see if any CPU (including 

wires and is used to send the return read data from the itself) writes to the relevant semaphore. If any CPU 

slave device (memory port and IOU) back to the master writes to a semaphore that matches an entry in the 

device. Data is only sent during one phase 1. This bus is CAM, that entry in the CAM is cleared. When a cache 

not double-pumped because of timing constraints of the next attempts to access the semaphore, it will not find 

caches in that the caches require that the data be valid that entry in the CAM and will then actually fetch the 

at the falling edge of CLK 1. Since the data is not avail- semaphore from main memory and set it to failed, i.e. all 

able from the port until phase 1 when clock 1 is high, if l's. 

an attempt were made to double-pump the SW_RD The function of the row match comparison circuit is 

bus, the earliest that a cache would get the data is at the 3Q to determine if the present request has the same row 

positive edge of CLK1 and not the negative edge address as the previous request. If it does, the port need 

thereof. Since bus SW_RD is not double-pumped, this not de-assert RAS and incur a RAS pre-charge time 

bus is only active (not tri-stated) during phase 2. There penalty. Thus, memory latency can be reduced and 

is no problem with bus driver conflict when the bus usable bandwidth increased. Row match is mainly used 

switches to a different master. 35 for dynamic random access memory (DRAM) but it can 

The bus SW_IDBST comprises four wires and is also be used for static random access memory (SRAM) 

used to send the identification (ID) from a master to a or read-only memory (ROM) in that the MAU now 

slave device and the ID and bank start signals from the need not latch in the upper bits of a new address. Thus, 

slave to the master device. when there is a request for access to the memory, the 

In a current embodiment of the present invention 40 address is sent on the switch network address bus 
there is only one ID FIFO at each slave device. Since SW_ REQ, the row address is decoded and stored in a 
data from a slave device is always returned in order, MUX latch. If this address is considered the row ad- 
there is no need to send the ID down to the port. The dress of a previous request, when a cache or an IOU 
ID could be stored in separate FIFO's, one FIFO for issues a new request, the address associated with the 
each port, at the interface between the switch and the 45 new address is decoded and its row address is compared 
master device. This requires an increase in circuit area with the previous row address. If there is a match, a row 
over the current embodiment since each interface must match hit occurs and the matching request is given 
now have n FIFO's if there are n ports, but the tri-state priority as explained below. 

wires can be reduced by two. In the dynamic switch/port arbitration circuit, two 
The port interface is an interface between the switch 50 different arbitrations are performed. One is for arbitrat- 
network and the external memory (MAU). It comprises ing for the resources of the memory ports, i.e. port 0 . . 
a port arbitration unit and means for storing requests . port N, and the other is an arbitration for the resources 
that cause interventions and interrupted read requests. of the address and write data buses of the switch net- 
It also includes a snoop address generator. It also has work, SW_REQ and SW_WD, respectively, 
circuits which act as signal generators to generate the 55 Several devices can request data from main memory 
proper timing signals to control the memory modules. at the same time. They are the D- and I-cache and the 
There are several algorithms which are implemented IOU. A priority scheme whereby each master is en- 
in apparatus in the switch network of the present inven- dowed with a certain priority is set up so that the re- 
tion including a test and set bypass circuit comprising a quests from more "important" or "urgent" devices are 
content addressable memory (CAM), a row match com- 60 serviced as soon as possible. However, a strict fixed 
parison circuit and a dynamic switch/port arbitration arbitration scheme is not used due to the possibility of 
circuit. starving the lower priority devices. Instead, a dynamic 
The architecture implements semaphores, which are arbitration scheme is used which allocates different 
used to synchronize software in multiprocessor systems, priorities to the various devices on the fly. This dy- 
with a "test and set" instruction as described below. 65 namic scheme is affected by the following factors: 
Semaphores are not cached in the architecture. The 1. Intrinsic priority of the device, 
cache fetches the semaphore from the MCU whenever 2. Does the requested address have a row match with 
the CPU executes a test and set instruction. the previously serviced request? 
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3. Has the device been denied service too many FIG. 1 is a block diagram of a microprocessor archi- 
times? tecture capable of supporting multiple heterogeneous 

4. Has that master been serviced too many times? microprocessors according to the present invention; 
Each request from a device has an intrinsic priority. FIG. 2 is a block diagram of a memory control unit 

IOU has the highest priority followed by the D- and 5 according to the present invention; 
I-cache, respectively. An intervention (ITV) request as FIG. 3 is a block diagram of a switch network show- 
described below, from the D-cache, however, has the mg interconnects between a D-cache interface and a 
highest priority of all since it is necessary that the slave port interface according to the present invention; 
processing element (PE) has the updated data as soon as FIG. 4 is a block diagram of a test and set bypass 
possible. 10 circuit according to the present invention; 

The intrinsic priority of the various devices is modi- FIG. 5 is a block diagram of a circuit used for gener- 

fied by several factors. The number of times a lower ating intervention signals and arbitrations for an MAU 

priority device is denied service is monitored and when bus according to the present invention; 

such number reaches a predetermined number, the FIG. 6 is a block diagram of a row match comparison 

lower priority device is given a higher priority. In con- 15 circuit according to the present invention; and 

trast, the number of times a device is granted priority is F 10 - 7 » a diagram of a dynamic arbitration scheme 

also monitored so that if the device is a bus "hog", it can according to the present invention, 

be denied priority to allow a lower priority device to RG - 8 13 a diagram showing the timing of a write 

gain access to the bus. A third factor used for modifying 50 request; and 

the intrinsic priority of a request is row match. Row HG. 9 is a diagram showing the timing of a read 

match is important mainly for the I-cache. When a request. 

device requests a memory location which has the same DETAILED DESCRIPTION OF THE 
row address as the previously serviced request, the DRAWINGS 
priority of the requesting device is increased. This is 25 _ „ . _ , . . , , . 
done so as to avoid having to dc-assert and re-assert Referring to FIG. 1, there is provjded in accordance 
RAS. Each time a request is serviced because of a row mth the . P resent invention a microprocessor architec- 
match, a programmable counter is decremented. Once ture designated generally as 1. In the architecture 1 
the counter reaches zero, for example, the row match ,s P™*? a V^day of general purpose micro- 
priority bit is cleared to allow a new master to gain 30 procesors 2, 3, 4 ... N, a special purpose prcce^r 5 an 
access to the bus. The counter is again pre-loaded with !? ,ter . 6 and a memory/memory array unit (MAU) 7. 
a programmable value when the new master of the port The microprocessors 2-N may comprise , a plurality of 
is different from the old master or when a request is not ldentlcal Processors or a plurality of heterogeneous 
a request with a row match processors. The special purpose processor 5 may com- 
A write request for a memory port will only be 35 P rise - for example, a graphics controller. All of the 
granted when the write data bus of the swkch network pr °™i" 5 ™ ^ Zl^M 

§W_WD> is available. If it is not available, some other ?9 RT « w^T" l ° T^nw/rni »hI^ 

• * comprising an MAU data bus 8, a ROW/COL address 

request ^elected. The oruy exception is for u i inter- P ^ ^ m MMJ control 

vention (ITV) request from theD-cache. If such a re- and a P bus ^ bus n by 

quest is present and the SW WD bus is not available, 40 ^ ^ of bidirectional signal buses 13-17, 

no request is selected. Instead, the system waits for the ^ ^ bus 12 is used( for example , for re - 

SW_WD bus to become free and then the intervention t0 ^ for granting or indi- 

request is granted. eating that the system data bus 8 is busy. The arbiter 6 

Two software-selectable arbitration schemes for the ^ to ^ bus n by means of a bidirectional 

switch network are employed. They are as follows: signal ^ lg ^ MAU 7 ^ coup i e d to the ROW/- 

1. Slave priority in which priority is based on the COL address and memory control buses 9 and 11 for 
slave or the requested device (namely, memory or IOU transferring signals from the buses to the MAU by 
port). 

means of unidirectional signal lines 19 and 20 and to the 

2. Master priority which is based on the master or the SQ MAU data 5us 8 by means of bidirectional data bus 21. 
requesting device (namely, IOU, D- and I-cache). Data buses 8 and 21 are typically 64 bit buses; however, 

In the slave priority scheme, priority is always given tbey may be operated as 32 bit buses under software 

to the memory ports, e.g. port 0, 1, 2 . . . first, then to the control. The bus may be scaled to other widths, e.g. 128 

IOU and then back to port 0, a scheme generally known b i tSi 

as a round robin scheme. The master priority scheme is 55 Each of the processors 2-N typically comprises an 

a fixed priority scheme in which priority is given to the input/output IOU interface 53, which will be further 

IOU and then to the D- and I-caches respectively. Al- described below with respect to FIG. 2, coupled to a 

ternatively, an intervention (ITV) request may be given plurality of peripheral I/O devices, such as a direct 

the highest priority under the master priority scheme in memory access (DMA) processor 30, an ETHERNET 

switch arbitration. Also, an I-cache may be given the go interface 31 and other I/O devices 32 by means of a 32 

highest priority if the pre-fetch buffer is going to be bit I/O bus 33 or an optional 32 bit I/O bus 34 and a 

empty soon. plurality of 32 bit bidirectional signal buses 35-42. The 

RttTFF INSCRIPTION OF THE DRAWINGS 0pti0nal V ° buS 34 be ^ bv one or more of 

BRIEF DESCRIPTION OF THE DRAWINGS processors to access a special purpose I/O device 43. 

The above and other objects, features and advantages 65 Referring to FIG. 2, each of the processors 2-N com- 

of the present invention will become apparent from the prises a memory control unit (MCU) designated gener- 

following detailed description of the accompanying ally as 50, coupled to a cache control unit (CCU) 49 

drawings, in which: comprising a data (D) cache 51 and an instruction (I) 
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cache 52 and an I/O port 53, sometimes referred to trol signal buses 103 and 104. The I-cache interface 56 
herein simply as IOU, coupled to the I/O bus 33 or 34. and the CCU 49, i.e. I-cache 52, are coupled by means 
The MCU 50 is a circuit whereby data and instruc- of a plurality of unidirectional signal buses including an 
tions are transferred (read or written) between the CCU RD data bus 110, an address bus 111, and a pair of 
49, i.e. both the D-cache 51 and the I-cache 52 (read 5 control signal buses 112 and 113. The I/O interface 57 
only), the IOU 53 and the MAU 7 via the MAU system and the IOU 53 are coupled by means of a plurality of 
bus 25. The MCU 50, as will be further described be- unidirectional, signal buses including an R/W-I/O mas- 
low, provides cache coherency. Cache coherency is ter data bus 120, an R/W-I/O slave data bus 121, a pan- 
achieved by having the MCU in each slave CPU moni- of control signal lines 123 and 124 and a pair of address 
tor, i.e. snoop, all transactions of a master CPU on the 10 buses 125 and 126. The designations I/O master and I/O 
MAU address bus 9 to determine whether the cache in slave are used to identify data transmissions on the des- 
the slave CPU has to request new data provided by the ignated signal lines when the I/O is operating either as 
master CPU or send new data to the master CPU. The a master or as a slave, respectively, as will be further 
MCU 50 is expandable for use with six memory ports described below. 

and can support up to four-way memory interleave on 15 Referring to FIG. 3, there is provided a block dia- 

the MAU data bus 8. It is able to support the use of an gram of the main data path of the switch network 54 

external 64- or 32-bit data bus 8 and uses a modified showing the interconnections between the D-cache 

hamming code to correct one data bit error and detect interface 55 and port interface Po. Similar interconnects 

two or more data bit errors. are provided for port interfaces Pi-Pivand the I-cache 

In the architecture of the present invention, cache 20 and I/O interfaces 56, 57 except that the I-cache inter- 
sub-block, i.e. cache line, size is a function of memory face 56 does not issue write data requests. As shown in 
bus size. For example, if the bus size is 32 bits, the sub- FIG. 3, there is further provided in each of the port 
block size is typically 16 bytes. If the bus size is 64 bits, interfaces Po-PMan identification (ID) first in, first out 
the sub-block size is typically 32 bytes. If the bus size is (FIFO) 130 which is used to store the ID of a read 
128 bits, the sub-block size is 64 bytes. As indicated, the 25 request, a write data (WD) FIFO 131 which is used to 
MCU 50 is designed so that it can be programmed to temporarily store write data until access to the MAU is 
support 1, 2 or 4-way interleaving, i.e. number of bytes available and a read data (RD) FIFO 132 which is used 
transferred per cycle. to temporarily store read data until the network 54 is 

In the MCU 50 there is provided one or more port available, 

interfaces designated port Po . . . Pm> a switch network 30 In the switch network 54 there is provided a plurality 

54, a D-cache interface 55, an I-cache interface 56 and of signal buses 140-143, also designated, respectively, as 

an I/O interface 57. As will be further described below request/address bus SW_REQ [28:0], write data bus 

with respect to FIG. 3, each of the port interfaces SW_WD[3 1:0], read data bus SW_RD [63 :0] and iden- 

Po-Ptf comprises a port arbitration unit designated, tification/bank start signal bus SW„IDBST[3K)] and 

respectively, PAUo . . . PAUm The switch network 54 35 the switch arbitration unit 58. The switch arbitration 

comprises a switch arbitration unit 58. unit 58 is provided to handle multiport I/O requests. 

When the MCU 50 comprises two or more port inter- The cache and port interface are coupled directly by 

faces, each of the port interfaces Po-P^is coupled to a some control signal buses and indirectly by others via 

separate MAU system bus, which is identical to the bus the switch network buses. For example, the port arbi- 

25 described above with respect to FIG. 1. In FIG. 2, 40 tration unit PAU in each of the port interfaces Po-Pyvis 

two such buses are shown designated 25q and 25//. The coupled to the switch arbitration unit 58 in the switch 

bus 25 m comprises buses 8 m, 9m 10m, lljyand 12m which network 54 by a pair of control signal buses including a 

are connected to port Pm by buses 13m, 14m 15 m 16m GRANT control line 70a and a REQUEST control line 

and 17m respectively. Buses 8m-17m are identical to 71a. The switch arbitration unit 58 is coupled to the 

buses 8-17 described above with respect to FIG. 1. 45 D-cache interface 55 by a GRANT control signal line 

Similarly, each of the port interfaces are coupled to the 71b. lines 70c and 706 and lines 71a and 71b are signal 

switch network 54 by means of a plurality of separate lines in the buses 70 and 71 of FIG. 2. A gate 75 and 

identical buses including write (WR) data buses 60, 60m registers 76 and 78 are also provided to store requests 

read (RD) data buses 61, 61m and address buses 62, 62m that cause interventions and to store interrupted read 

and to each of the cache and I/O interfaces 55, 56, 57 by 50 requests, respectively. Corresponding control buses are 

means of a plurality of control buses 70, 71, 80, 81, 90 provided between the other port, cache and I/O inter- 

and 91 and 70m 71m, 80m, 81m, 90Mand 91m where the faces. 

subscript N identifies the buses between port interface The function of the switch network 54 is to receive 

Pyv and the cache and I/O interfaces. the various instructions and data requests from the 

The switch network 54 and the D-cache interface 55 55 cache control units (CCU), i.e. (I-cache 51, D-cache 52, 
are coupled by means of a WR data bus 72, an RD data and the IOU 53. In response to receiving the requests, 
bus 73 and an address bus 74. The switch network 54 the switch arbitration unit 58 in the switch network 54 
and the I-cache interface 56 are coupled by means of an which services one request at a time, prioritizes the 
RD data bus 82 and an address bus 83. It should be requests and passes them to the appropriate port inter- 
noted that the I-cache 52 does not issue write (WR) 60 face Po-Pyvor I/O interface depending upon die address 
requests. The switch network 54 and the I/O interface accompanying the request. The port and I/O interfaces 
57 are coupled by means of a plurality of bidirectional are typically selected by means of the high order bits in 
signal buses including an RD data bus 92, a WR data bus the address accompanying the request. Each port inter- 
93 and an address bus 94. face has a register 77 for storing the MAU addresses. 

The D-cache interface 55 and the CCU 49, i.e. D- 65 The port interface will then generate the necessary 

cache 51, are coupled by means of a plurality of unidi- timing signals and transfer the necessary data to/from 

rectional signal buses including a WR data bus 100, an the MAU 7. If the request is a WR request, the interac- 

RD data bus 101, an address bus 102 and a pair of con- tion between the port interface and the switch network 
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54 stops when the switch has pushed all of the write SW_RD bus is not double-pumped, this bus is only 

data into the WDF (write data FIFO) 131. If it is a RD active (not tri-stated) during CLK1 and there is no 

request, the interaction between the switch network 54 problem with bus buffer conflict where two bus drivers 

and the port interface only ends when the port interface drive the same wires at the same time, 

has sent the read data back to the switch network 54. 5 The SW_IDBST[3:0] is used to return the identifica- 

As will be further described below, the switch net- tion (ID) code and a bank start code from the slave to 

work 54 is provided for communicating between a mas- the master device via the bus 88. Since data from a slave 

ter and a slave device. In this context, the possible mas- device is always returned in order, there is generally no 

ter devices are: need to send the ID down to the port. The ID can be 

1. D-cache 10 stored in separate FIFO's, one FIFO for each port in 

2. 1-cache the interface. 

3. IOU Referring again to the read FIFO 132, data is put into 

and the possible slave devices are: this FIFO only when the switch read bus SW_RD is 

1. memory port not available. If the bus SW_RD is currently being 

2. IOU 15 used by some other port, the oncoming read data is 
The switch network 54 is responsible for sending the temporarily pushed into the read FIFO 132 and when 

necessary intervention requests to the appropriate port the SW_RD bus is released, data is popped from the 

interface for execution. FIFO and transferred through the switch network 54 to 

As described above, the switch network 54 comprises the requesting cache or IOU. 

four sets of tri-state buses that provide the connection 20 The transfer of data between the D-cache interface 

between the cache, I/O and memory port interfaces. 55, the I-cache interface 56, the I/O interface 57 and the 

The four sets of tri-state buses are SW_REQ, port interfaces Po-pArwill now be described using data 

SW_WD, SW_RD and SW__IDBST. The bus desig- transfers to/from the D-cache interface 55 as an exam- 

nated SW_REQ[28:0] is used to send the address in the pie. 

slave device and the memory share signal and the ID 25 When one of the D-cache, I-cache or IOU*s wants to 
from the master device to the slave device. As indicated access a port, it checks to see if the port is free by send- 
above, the master may be the D-cache, I-cache or an ing the request to the port arbitration unit PAUg on the 
IOU and the slave device may be a memory port or an request signal line 70b as shown in FIG. 3. If the port is 
IOU. When the master device is issuing a request to a free, the port interface informs the switch arbitration 
slave, it is not necessary to send the full 32 bits of ad- 30 unit 58 on the request control line 71a that there is a 
dress on the switch bus SW„REQ. This is because in request. If the switch network 54 is free, the switch 
the multiple memory port structure of the present in- arbitration unit 58 informs the port on the grant control 
vention, each port has a pre-defined memory space line 70a and the master; e.g. D-cache interface 55, that 
allotted to it the request is granted on the control line 71*. 

Other request attributes such as the function code 35 If the request is a write request, the D-cache interface 

(FC) and the data width (WD) are not sent on the circuit 55 checks the bus arbitration control unit 172 to 

SW—REQ bus because of timing constraints. The infor- determine whether the MCU 50 is granted the MAU 

mation carried over the switch network 54 arrives at the bus 25. If the MCU has not been granted the bus 25, a 

port interface one clock phase later than the case if the request is made for the bus. If and when the bus is 

information has been carried on dedicated wires. Thus, 40 granted, the port arbitration unit 171 makes a request 

the early request attributes need to be sent to the port for the switch buses 140, 141. After access to the switch 

interface one phase earlier so that the port interface can buses 140, 141 is granted, the D-cache interface circuit 

start its state machine earlier and thereby decrease 55 places the appropriate address on the switch bus 

memory latency. This is provided by a separate signal SW_REQ 140 and at the same time places the write 

line 79, as shown in FIG. 3. Line 79 is one of the lines in 45 data on the write data bus SW_WD 141 and stores it in 

the control signal bus 70 of FIG. 2. the WD FIFO (WDF) 131. When the data is in the 

The SW_WD [31:0] bus is used to send write data WDF, the MCU subsequently writes the data to the 
from the master device (D cache and IOU) to the WD MAU. The purpose of making sure that the bus is 
FIFO 131 in the memory port interface. This tri-state granted before sending the write data to the port is so 
bus is double-pumped, which means that 32 bits of data 50 that the MCU need not check the WDF when there is 
are transferred every phase. Since the buses are double- a snoop request from an external processor. Checking 
pumped, care is taken in the circuit design to insure that for modified data therefore rests solely on the cache, 
there is no bus-conflict when the buses turn around and If the request is a read request, and the port and 
switch from one master to a new master. As will be switch network are determined to be available as de- 
appreciated, double-pumping reduces the number of 55 scribed above, the port interface receives the address 
required bit lines thereby minimizing expensive wire from the requesting unit on the SW—REQ bus and 
requirements with minimal performance degradation. arbitrates using the -arbiter for the MAU bus 9. The 

Referring to FIG. 9, the SW_RD[63:0] bus is used to MAU arbiter informs the port that the MAU bus has 

send the return read data from the slave device (mem- been granted to it before the bus can actually be used, 

ory port or IOU) back to the master device. Data is sent 60 The request is then transferred to the port by the 

only during phase 1 of the clock (when CLK1 is high). switch. When the MAU address bus 9 is free, the ad- 

This bus is not double-pumped because of a timing con- dress is placed on the MAU address bus. The port 

straint of the cache. The cache requires that the data be knows, ahead of time, when data will be received. It 

valid at the falling edge, of CLK1. Since the data is requests the switch return data bus so that it is available 

received from the port interface during phase 1, if the 65 when the data returns, if it is not busy. When the bus is 

SW_RD bus was double-pumped, the earliest that the free, the port puts the read data on the bus which the 

cache would get the data would be at the positive edge D-cache, I-cache or I/O interface will then pick up and 

of CLK1, not at the negative edge of CLK1. Since the give to its respective requesting unit. 
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If the D/I-cache 51,52 makes a request for an I/O 
address, the D/I-cache interface 55,56 submits the re- 
quest to the I/O interface unit 57 via the request bus 
SW—REQ. If the I/O interface unit 57 has available 
. entries in its queues for storing the requests, it will sub- 5 
mit the request to the switch arbitration unit 58 via the 
control signal line 90, Once again, if the switch network 
54 is free, the switch arbitration unit 58 informs the D/I 
cache interface 55,56 so that it can place the address on 
the address bus SW_REQ and, if it is a write request (D 10 
cache only), the write data on the write data bus 
SW_WD for transfer to the IOU. Similarly, if the re- 
quest from the D/I cache interface 55,56 is a read re- 
quest, the read data from the I/O interface 57 is trans- 
ferred from the I/O interface 57 via the switch network 15 
54 read data bus SW_RD and provided to the D/I 
cache interface 55,56 for transfer to the D/I cache 
51,52. 

Referring to FIG. 4, there is further provided in the 
port interfaces and caches in accordance with the pres- 20 
ent invention test and set (TS) bypass circuits desig- 
nated generally as 160 and 168, respectively, for moni- 
toring, i.e, snooping, for addresses of semaphores on the 
MAU address bus 9. As will be seen, the circuits 160, 
168 reduce the memory bandwidth consumed by spin- 25 
locking for a semaphore. 

In the TS circuits 160, 168 there is provided a snoop 
address generator 161, a TS content addressable mem- 
ory (CAM) 162, a flip-flop 163 and MUX's 164 and 165. 

A semaphore is a flag or label which is stored in an 30 
addressable location in memory for controlling access 
to certain regions of the memory or other addressable 
resources. When a CPU is accessing a region of mem- 
ory with which a semaphone is associated, for example, 
and does not want to have that region accessed by any 35 
other CPU, the accessing CPU places all l's in the 
semaphore. When a second CPU attempts to access the 
region, it first checks the semaphore. If it finds that the 
semaphore comprises all l's, the second CPU is denied 
access. Heretofore, the second CPU would repeatedly 40 
issue requests for access and could be repeatedly denied 
access, resulting in what is called "spin-locking for a 
semaphore". The problem with spin-locking for a sema- 
phore is that it uses an inordinate amount of memory 
bandwidth because for each request for access, the re- 45 
questing CPU must perform a read and a write. 

The Test and Set bypass circuits 160,168 of FIG. 4 
are an implementation of a simple algorithm that re- 
duces memory bandwidth utilization due to spin-lock- 
ing for a semaphore. 50 

In operation, when a CPU, or more precisely, a pro- 
cess in the processor, first requests access to a memory 
region with which a semaphore is associated by issuing 
a load-and-set instruction, i.e. a predetermined instruc- 
tion associated with a request to access a semaphore, the 55 
CPU first accesses the semaphore and stores the address 
of the semaphore in the CAM 162. Plural load-and-set 
instructions can result in plural entries being in the 
CAM 162. If the semaphore contains all l's (SFFFFs), 
the Ts are returned indicating that access is denied. 60 
When another process again requests for the sema- 
phore, it checks its CAM. If the address of the requested 
semaphore is still resident in the CAM, the CPU knows 
that the semaphore has not been released by another 
processor/process and there is therefore no need to 65 
spin-lock for the semaphore. Instead, the MCU receives 
all l's (semaphore failed) and the semaphore is not re- 
quested from memory; thus, no memory bandwidth is 
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unnecessarily used. On the other hand, if the semaphore 
address is not in the CAM, this means that the sema- 
phore has not been previously requested or that it has 
been released. 

The MAU bus does not provide byte addresses. The 
CAM must be cleared if the semaphore is released. The 
CAM is cleared if a write to any part of the smallest 
detectable memory block which encloses the sema- 
phore is performed by any processor on the MAU bus. 
The current block size is 4 or 8 bytes. In this way, the 
CAM will never hold the address of a semaphore which 
has been cleared, although the CAM may be cleared 
when the semaphore has not been cleared by a write to 
another location in the memory block. The semaphore 
is cleared when any processor writes something other 
than all l's to it. 

If a semaphore is accessed by a test and set instruction 
after a write has occurred to the memory block contain- 
ing the semaphore, the memory is again accessed. If the 
semaphore was cleared, the cleared value is returned to 
the CPU and the CAM set with the address again. If the 
semaphore was not cleared or was locked again, the 
CAM is also loaded with the semaphore address, but the 
locked value is returned to the CPU. 

In the operation of the circuit 160 of FIG. 4, the 
circuit 160 snoops the MAU address bus 9 and uses the 
address signals detected thereon to generate a corre- 
sponding snoop address in the address generator 161 
which is then sent on line 169 to, and compared with, 
the contents of the CAM 162. If there is a hit, i.e. a 
match with one of the entries in the CAM 162, that 
entry in the CAM 162 is cleared. When a load and set 
request is made to the MCU from, for example, a D- 
cache, the D-cache interface circuit compares the ad- 
dress with entries in the CAM. If there is a hit in the 
CAM 162, the ID is latched into the register 163 in the 
cache interface and this ID and all l's (SFFFF) are 
returned to the cache interface by means of the MUX's 
164 and 165. 

The snooping of the addresses and the generation of 
a snoop address therefrom in the snoop address genera- 
tor 161 for comparison in the CAM 162 continues with- 
out ill effect even though the addresses appearing on the 
MAU address bus 9 are to non-shared memory loca- 
tions. The snoop address generator 161 typically gener- 
ates a cache block address (high order bits) from the 11 
bits of the MAU row and column addresses appearing 
on the MAU address bus 9 using the MAU control 
signals RAS, CAS and the BKST START MAU con- 
trol signals on the control signal bus 11. 

Referring to FIG. 5, there is provided in accordance 
with another aspect of the present invention a circuit 
designated generally as 170 for providing cache coher- 
ency. Cache coherency is necessary to insure that in a 
multiprocessor environment the master and slave de- 
vices, i.e. CPU's, all have the most up-to-date data. 

Shown outside of the chip comprising the circuit 170, 
there is provided the arbiter 6, the memory 7 and the 
MAU address bus 9, MAU control bus 11 and multipro- 
cessor control bus 10. In the circuit 170 there is pro- 
vided a port arbitration unit interface 171, a bus arbitra- 
tion control unit 172, a multiprocessor control 173 and 
the snoop address generator 161 of FIG. 4. The D- 
cache interface 55 is coupled to the multiprocessor 
control 173 by means of a pair of control signal buses 
174 and 175 and a snoop address bus 176. The I-cache 
interface 56 is coupled to the multiprocessor control 173 
by means of a pair of control signal buses 177 and 178 
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and the snoop address bus 176. The snoop address gen- discards the data requested and re-asserts its request for 
erator 161 is coupled to the multiprocessor control 173 data from the MAU. If the data requested has been 
by means of a control signal bus 179. The multiproces- transferred to the MSTR CCU f the MSTR MCU in- 
sor control 173 is further coupled to the multiprocessor forms the MSTR CCU (or IOU controller, if an IOU is 
control bus 10 by means of a control signal bus 180 and 5 the MSTR) to discard the data. The MSTR MCU then 
to the bus arbitration control unit 172 by a control sig- reissues its read request after the slave has updated main 
nal bus 181. The port arbitration unit interface 171 is memory. Meanwhile, the port interface circuit holds 
coupled to the bus arbitration control unit 172 by a the master's read request while the slave writes the 
control signal bus 182. The bus arbitration control unit modified data back to memory. Thereafter, the read 
172 is coupled to the arbiter 6 by a bus arbitration con- 10 request is executed. 

trol bus 183. The snoop address generator 161 is also If the MSTR issues a write request, places an address 
coupled to the MAU address bus 9 and the MAU con- on the memory array unit (MAU) address bus 9 and a 
trol bus 11 by address and control buses 14 and 16, slave CCU has a copy of the original data from this 
respectively. address in its cache, the slave CCU will invalidate, i.e. 

A request from a cache will carry with it an attribute 15 discard, the corresponding data in its cache, 
indicating whether or not it is being made to a shared If the MSTR issues a read-with-intent-to-modify re- 
memory. If it is to a shared memory, the port interface quest, places an address on the memory array unit 
sends out a share signal SHARED— REQ on the multi- (MAU) address bus 9 and a slave MCU has the address 
processor control signal (MCS) bus 10. When other placed on the address bus by the master (MSTR), one of 
CPU's detect the share signal on the MCS bus 10 they 20 two possible actions will take place: 
begin snooping the MAU ADDR bus 9 to get the snoop 1. If the SLV CCU has modified the data correspond- 
address. fog to the data addressed by the MSTR, the SLV will 

Snooping, as briefly described above, is the cache issue an ITV signal, the MSTR will give up the bus in 
coherency protocol whereby control is distributed to response thereto and allow the SLV CCU to write the 
every cache on a shared memory bus, and all cache 25 modified data to memory. This operation corresponds 
controllers (CCU's) listen or snoop the bus to determine to the intervention operation described above, 
whether or not they have a copy of the shared block. 2. If the SLV has unmodified data corresponding to 
Snooping, therefore, is the process whereby a slave the data addressed by the MSTR, the SLV will invali- 
MCU monitors all the transactions on the bus to check date, i.e. discard, its data. This operation corresponds to 
for any RD/WR requests issued by the master MCU. 30 the invalidation operation discribed above. 
The main task of the slave MCU is to snoop the bus to Referring to FIG. 6, there is provided in accordance 
determine if it has to receive any new data, i.e. invali- with another aspect of the present invention a circuit 
date data previously received, or to send the freshest designated generally as 190 which is used for row 
data to the master MCU, i.e. effect an intervention. match comparison to reduce memory latency. In the 

As will be further described below, the multiproces- 35 circuit 190 there is provided a comparator 191, a latch 
sor control circuit 173 of FIG. 5 is provided to handle 192 and a pair of MUX'S 193 and 194. 
invalidation, intervention and snoop hit signals from the The function of the row match comparison is to de- 
cache and other processors and generate snoop hit tennine if the present request has the same row address 
(SNP-JUT) signals and intervention (ITV— REQ) sig- as a previous request. If it does, the port need not incur 
nals on the multiprocessor control signal bus 180 when 40 the time penalty for de-asserting RAS. Row match is 
snoop hits and intervention/invalidation are indicated, mainly used for DRAM but it can also be used for 
as will be further described below. SRAM or ROM in that the MAU need not latch in the 

The bus arbitration control unit 172 of FIG. 5 arbi- upper, i.e. row, bits of the new address, since ROM and 
trates for the MAU bus in any normal read or write SRAM accesses pass addresses to the MAU in high and 
operation. It also handles arbitrating for the MAU bus 45 low address segments in a manner similar to that used 
in the event of an intervention/invalidation and inter- by DRAMS. 

faces directly with the external bus arbitration control In the operation of the row match circuitry of FIG. 6, 
signal pins which go directly to the external bus arbiter the row address including the corresponding array se- 
& lect bits of the address are stored in the latch 192 by 

The operations of intervention and invalidation 50 means of the MUX 193. Each time a new address ap- 
which provide the above-described cache coherency pears on the switch network address bus SW—REQ, the 
will now be described with respect to read requests, address is fed through the new request MUX 194 and 
write requests, and read-with-intent-to-modify requests compared with the previous request in the comparator 
issued by a master central processing unit (MSTR 191. If there is a row match, a signal is generated on the 
CPU). 55 output of the comparator 191 and transferred to the port 

When the MSTR CPU issues a read request, it places interface by means of the signal line 195 which is a part 
an address on the memory array unit (MAU) address of bus 70. The row match hit will prevent the port 
bus 9. The slave (SLV) CPU's snoop the addresses on interface from de-asserting RAS and thereby saving 
the MAU bus 9. If a SLV CPU has data from the ad- RAS cycle time. 

dressed memory location in its cache which has been 60 MUX 193 is used to extract the row address from the 
modified, the slave cache control unit (SLV CCU) switch revest address. The row address mapping to the 
outputs an intervention signal (ITV) on the multipro- switch address is a function of the DRAM configura- 
cessor control bus 10, indicating that it has fresh, i.e. tion (e.g., lMx 1 or 4MX 1 DRAM's) and the MAU 
modified, data. The MSTR, upon detecting the ITV data bus width (e.g., 32 or 64 bits), 
signal, gives up the bus and the SLV CCU writes the 65 Referring to FIGS. 1 and 5, the external bus arbiter 6 
fresh data to the main memory, Le. MAU 7. If the data is a unit which consists primarily of a programmable 
requested by the MSTR has not been received by the logic array (PLA) and a storage element. It accepts 
MSTR cache control unit (CCU), the MSTR MCU requests for the MAU bus from the different CPU's, 
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decides which of the CPU's shouJd be granted the bus 
based on a software selectable dynamic or fixed priority 
scheme, and issues the grant to the appropriate CPU. 
The storage element is provided to store which CPU 
was last given the bus so that either the dynamic or 5 
flexible priority as well as the fixed or "round robin" 
priority can be implemented. 

Referring to FIG. 7, dynamic switch and port arbitra- 
tion as used in the multiprocessor environment of the 
present invention will now be described. 10 

As described above, there are three masters and two 
resources which an MCU serves. The three masters are 
D-cache, I-cache and IOU. The two resources, i.e. 
slaves, are memory ports and IOU. As will be noted, the 
IOU can be both a master and a resource/slave. 15 

In accordance with the present invention, two differ- 
ent arbitrations are done. One is concerned with arbi- 
trating for the resources of the memory ports (port 0 to 
port 5) and the other is concerned with arbitrating for 
the resources of the switch network 54 buses SW_REQ 20 
and SW_WD. 

Several devices can make a request for data from 
main memory at the same time. They are the D and 
I-cache and the IOU. A priority scheme whereby each 
master is endowed with a certain priority is used so that 25 
requests from more "important*' or "urgent" devices 
are serviced as soon as possible. However, a strict fixed 
arbitration scheme is not preferred due to the possibility 
of starving lower priority devices. Instead, a dynamic 
arbitration scheme is implemented which allocates dif- 30 
ferent priority to the various devices on the fly. This 
dynamic arbitration scheme is affected by the following 
factors: 

1. Intrinsic priority of the device. 

2. There is a row match between a requested address 35 
and the address of a previously serviced request. 

3. A device has been denied service too many times. 
4-. The master has been serviced too many times. 
As illustrated in FIG. 7, the dynamic priority scheme 

used for requesting the memory port is as follows. 40 

Each request from a device has an intrinsic priority. 
The IOU may request a high or normal priority, fol- 
lowed by the D and then the I-cache. An intervention 
(ITV) request from a D-cache, however, has the high- 
est priority of all. 45 

Special high priority I/O requests can be made. This 
priority is intended for use by real-time I/O peripherals 
which must have access to memory with the low mem- 
ory latency. These requests can override all other re- 
quests except intervention cycles and row-match, as 50 
shown in FIG. 7. 

The intrinsic priority of the various devices is modi- 
fied by several factors, identified as denied service, I/O 
hog, and row match. Each time a device is denied ser- 
vice, a counter is decremented. Once the counter 55 
reaches zero, the priority of the device is increased with 
a priority level called DENY PRIORITY. These 
counters can be loaded with any programmable value 
up to a maximum value of 15. Once the counter reaches 
zero, a DENY PRIORITY bit is set which is finally 60 
cleared when the denied - device is serviced. This 
method of increasing the priority of a device denied 
service prevents starvation. It should be noted that a 
denied service priority is not given to an IOU because 
the intrinsic priority level of the IOU is itself already 65 
high. 

Since the IOU is intrinsically already a high priority 
device, it is also necessary to have a counter to prevent 
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it from being a port hog. Every time the IOU is granted 
use of the port, a counter is decremented. Once the 
counter reaches zero, the IOU is considered as hogging 
the bus and the priority level of the IOU is decreased. 
The dropping of the priority level of the IOU is only for 
normal priority requests and not the high priority I/O 
request. When the IOU is not granted the use of the port 
for a request cycle, the hog priority bit is cleared. 

Another factor modifying the intrinsic priority of the 
request is row match. Row match will be important 
mainly for the I-cache. When a device requests a mem- 
ory location which has the same row address as the 
previously serviced request, the priority of the request- 
ing device is raised. This is done so that RAS need not 
be reasserted. 

There is a limit whereby row match priority can be 
maintained, however. Once again a counter is used with 
a programmable maximum value. Each time a request is 
serviced because of the row match priority, the counter 
is decremented. Once the counter reaches zero, the row 
match priority bit is cleared. The counter is again pre- 
loaded with a programmable value when a new master 
of the port is assigned or when there is no request for a 
row match. The above-described counters are located 
in the switch arbitration unit 58. 

A write request for the memory port will only be 
granted when the write data bus of the switch 
SW—WD is available. If it is not available, another 
request will be selected. The only exception is for the 
intervention signal ITV. If SW_WD is not available, 
no request is selected. Instead, the processor waits for 
SW_WD to be free and then submits the request to the 
switch arbiter. 

The arbitration scheme for the switch network 54 is 
slightly different than that used for arbitrating for a 
port. The switch arbitration unit 58 of FIG. 3 utilizes 
two different arbitration schemes when arbitrating for a 
port which are selectable by software: 

1. Slave priority in which priority is based on the 
slave or the requested device (namely, memory or IOU 
port) and 

2. Master priority wherein priority is based on the 
master or the requesting device (namely, IOU, D and 
I-cache). 

In the slave priority scheme priority is always given 
to the memory ports in a round robin fashion, i.e. mem- 
ory ports 0, 1, 2 . . . first and then to IOU. In contrast, 
in the master priority scheme priority is given to the 
IOU and then to the D and I-cache, respectively. Of 
course, under certain circumstances it may be necessary 
or preferable to give the highest priority under the 
master priority to an ITV request and it may also be 
necessary or preferable to give I-cache a high priority if 
the pre-fetch buffer is going to be empty soon. In any 
event, software is available to adjust the priority scheme 
used to meet various operating conditions. 

Dynamic memory refresh is also based on a priority 
scheme. A counter coupled to a state machine is used to 
keep track of how many cycles have expired between 
refreshes, i.e. the number of times a refresh is requested, 
and has been denied because the MAU bus was busy. 
When the counter reaches a predetermined count, i.e. 
expired, it generates a signal to the port telling the port 
that it needs to do a refresh. If the port is busy servicing 
requests from the D or I caches or the IOU, it won't 
service the refresh request unless it previously denied a 
certain number of such requests. In other words, prior- 
ity is given to servicing refresh requests when the re- 
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fresh requests have been denied a predetermined num- (b) master priority in which priority is based on the 

ber Of times. When the port is ready to service the re- master device and priority is given to said IOU and 

fresh request, it then informs the bus arbritration con- then to said cache. 

trol unit to start arbitrating for the MAU bus. 3. The multiprocessor architecture according to 
A row is preferably refreshed every 15 microseconds 5 claim 2, wherein highest priority is given to an interven- 
ed must be refreshed within a predetermined period, tion request. 

e.g. at least every 30 microseconds. 4. The multiprocessor system according to claim 1, 
When RAS goes low (asserted) and CAS is not as- wherein said cache comprises one of a data cache (In- 
serted, all CPU's know that a refresh has occurred. cache) and an instruction cache (I-cache). 
Since all CPU's keep track of when the refreshes occur, 10 5. The multiprocessor system according to claim 4, 
any one or more of them can request a refresh if neces- further comprising: 

sary. means for providing a dynamic priority to said re- 
While preferred embodiments of the present inven- quest to transfer data as a function of intrinsic pri- 
tion are described above, it is contemplated that numer- ority assigned to each of said IOU, D-cache and 
ous modifications may be made thereto for particular I-cache devices and a plurality of factors that can 
applications without departing from the spirit and scope modify the intrinsic priority, including: 
of the present invention. Accordingly, it is intended that ( a ) existence of a row match between a requested 
the embodiments described be considered only as illus- address and a previously serviced request, 
trative of the present invention and that the scope 2 q ^ **ow manv times one of said devices has been 
thereof should not be limited thereto but be determined denied service, and 

by reference to the claims hereinafter provided. ( c ) how manv times one of said devices has been 

What is claimed is: serviced without interruption. 

1. In a multiprocessor system having a plurality of 6 - Tne multiprocessor system according to claim 5, 
microprocessors, each of said microprocessors having a 2 $ wnerein said means for providing a dynamic priority 
cache, a memory port, and an input/output unit (IOU), assigns a D-cache intervention request highest intrinsic 
a memory control unit (MCU) in each of said micro- priority. 

processors comprising: 7 - ^ multiprocessor system according to claim 5, 

a switch network; wherein said requests having row matches have higher 

a cache interface circuit; 30 P riorit y intrinsic priority requests, 

means for coupling said cache interface circuit between 8 - ^ multiprocessor system according to claim 5, 

said cache and said switch network; wherein row match IOU device requests have higher 

an I/O interface circuit- priority than row match D-cache device requests, and 

means for coupling said I/O interface circuit between row match D-cache device requests have higher 

said IOU and said switch network; 35 P™^ than row match I-cache device requests, 
a memory port interface circuit; * ^ multiprocessor system according to claim 5, 
means for coupling said memory port interface cir- wherein said means for providing a dynamic priority 
cuit between said memory port and said switch assigns both hig* t and normal intrinsic priority to re- 
network- quests from said IOU device, and normal intrinsic pnor- 

switch arbitration means for arbitrating for said 40 * requests said D-cache device and said I- 

. , ^ . cache device, 
switch network* 

— i_- ^ * c . - . . 10. The multiprocessor system according to claim 9, 

port arbitration means for arbitrating for said mem- , . . , v_ , . . 1AT ° T , 

r wherein said normal intrinsic priority IOU device re- 

ory po , quests are higher priority than said normal intrinsic 

means for transferring ;to said port arbitraUon means ia ^ D ^ he / evice y ^ ^ ^ normal m . 

request to tonsfer date between one of said cache ^ riori D . cache d £ ice s{s m u ^ rf . 

and said IOU and said memory port through said orf ^ ^ normal 

intrinsic priority I-cache device 

switch network and said port interface circuit; requests 

means for transferring a port available signal from n ^ multiprocessor syste m according to claim 10, 

said port arbitration means to said switch arbitra- further comprising; 

tion means when said port interface circuit is free mea ns for decrementing a counter each time said 

to process said request; and D-cache or said I-cache device is denied service; 
means responsive to said port available signal for 

transferring a switch available signal from said means f or mcreas ing priority of D-cache and said 

switch arbitration means to the source of said re- 55 I-cache device requests above said normal priority 

quest and to said port arbitration means when said iqu device requests to thereby provide denied 

switch network is free to process said request service priority requests for said D-cache and said 

whereby data is enabled to be transferred between I-cache devices once said counter reaches zero, 

said one of said cache and said IOU and said mem- The multiprocessor system according to claim 11, 

ory port. $0 wherein D-cache denied service priority requests are 

2. The MCU according to claim 1, wherein said higher priority than I-cache denied service priority 
switch arbitration means for arbitrating for said switch requests. 

network includes means for providing one of the fol- 13. The multiprocessor system according to claim 11, 

lowing arbitration schemes: wherein said counter can be loaded with any value up 

(a) slave priority in which priority is based on the 65 to a maximum value of 15. 

slave device and priority is always given to the 14. The multiprocessor system according to claim 1, 

memory ports first and then to said IOU in a round wherein said cache comprises a data cache (D-cache) 

robin fashion; and and an instruction cache (I-cache). 
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15. The multiprocessor system according to claim 14, 
wherein said cache interface circuit comprises: 

a D-cache interface circuit; and 
an I-cache interface circuit. 

16. The multiprocessor system according to claim 15, 5 
wherein said means for coupling said cache interface 
circuit between said cache and said switch network 
further comprises: 

means for coupling said D-cache interface circuit 
between said D-cache and said switch network; 10 
and 

means for coupling said I-cache interface circuit be- 
tween said I-cache and said switch network. 

17. The MCU according to claim 16 wherein said 
switch network comprises a switch request bus (SW_ 15 
REQ), a switch write data bus (SW_WD), and a switch 
read data bus (SW_RD) and further comprising: 

means for coupling said MCU to a memory array unit 
(MAU) via an MAU system bus, said MAU system 
bus including a MAU address bus, an MAU data 20 
bus and an MAU control signal bus; 

means for temporarily storing an address associated 
with a request to write to said MAU from one of 
said D-cache and said IOU if said MAU address 
bus is not then available to receive said address; 25 

means for temporarily storing write data from said 
source of said request to write to said MAU if said 
MAU data bus is not then available to receive said 
write data; 

means for transferring said address associated with 30 
said request to write to said MAU from said source 
of said request to write to said MAU to said switch 
request bus (SW— REQ) and said write data associ- 
ated therewith to said switch write data bus 
(SW_WD); 35 

means for transferring said address associated with 
said request to write to said MAU from said switch 
request bus (SW_REQ) to said means for tempo- 
rarily storing said address associated with said re- 
quest to write to said MAU; 40 

means for transferring said write data from said 
switch write data bus (SW_WD) to said means for 
temporarily storing said write data; and 

means for transferring said address from said means 
for temporarily storing said address to said MAU 45 
address bus and said write data from said means for 
temporarily storing said write data to said MAU 
address and write data buses when said MAU ad- 
dress and write data buses are available to receive 
said address and said write data. 50 

18. The MCU according to claim 16 wherein said 

switch network comprises a switch request bus (SW 

REQ), a switch write data bus (S W_WD), and a switch 
read data bus (SW RD) and further comprising: 

means for coupling said MCU to a memory array unit 55 
(MAU) via an MAU system bus, said MAU system 
bus including an MAU address bus, an MAU data 
bus and an MAU control signal bus; 

means for temporarily storing an address associated 
with a read request to read data from said MAU 60 
from one of said D-cache, I-cache and IOU if said 
MAU address bus is not then available to receive 
said address; 

means for temporarily storing said read data from 
said MAU if said switch read data bus (S W_RD) is 65 
not then available to transfer said read data; 

means for transferring said address associated with 
said read request from said source of said request to 
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said switch request bus (SW_REQ) when said 
switch request bus (SW_REQ) is available; 

means for transferring said address associated with 
said read request from said switch request bus 
(SW_REQ) to said means for temporarily storing 
said address associated with said read request if 
said MAU address bus is not then available to re- 
ceive said address; 

means for transferring said read data from said MAU 
data bus to said means for temporarily storing said 
read data when said MAU address bus is available 
to receive said address and said switch read bus 
(SW_RX>) is not available to transfer said read 
data; and 

means for transferring said read data from said means 
for temporarily storing said read data to said switch 
read data bus (SW_RD) and from said switch read 
data bus (SW„RD) to said source of said request 
when said switch read data bus (SW_RD) is avail- 
able to transfer said read data. 

19. The MCU according to claim 16 wherein said 

switch network comprises a switch request bus (SW 

REQ), a switch write data bus (SW_WD), and a switch 
read data bus (SW_RJD) and further comprising: 

means for transferring a request for an I/O data trans- 
fer between one of said D-cache and said I-cache 
and said IOU through said switch network and said 
I/O interface circuit; 

means for sending an IOU available signal from said 
I/O interface circuit to said switch arbitration 
means when said I/O interface circuit is available 
to process said request for an I/O data transfer; and 

means for transferring an address associated with said 
request for an I/O data transfer to said I/O inter- 
face circuit via said switch request bus (S W_REQ) 
when said switch network is available to process 
said request. 

20. The MCU according to claim 16 wherein said 

switch network comprises a switch request bus (SW 

REQ), a switch write data bus (SW.WD), and a switch 
read data bus (SW_RD) and further comprising: 

means for transferring write data from one of said 
D-cache and I-cache to said I/O interface circuit 
via said switch write data bus (SW_WD) when 
said request for an I/O data transfer is a write re- 
quest; and 

means for transferring read data from said IOU cir- 
cuit to one of said D-cache and I-cache via said 
switch read data bus (SW_RD) when said request 
for an I/O data transfer is a read request 

21. The MCU according to claim 16, further compris- 
ing: 

means for coupling said MCU to a memory array unit 
(MAU) via an MAU system bus, said MAU system 
bus including an MAU address bus, an MAU data 
bus and an MAU control signal bus; 

a test and set bypass circuit, said test and set bypass 
circuit having a snoop address generator coupled 
to said MAU address bus for generating snoop 
addresses corresponding to addresses on said MAU 
address bus and a content addressable memory 
(CAM); 

means responsive to the execution of a predetermined 
instruction associated with a request to access a 
shared memory region for storing the address of a 
semaphore associated with said region in said 
CAM; 
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means for comparing said snoop addresses with the 
contents of said CAM on subsequent requests for 
said semaphore; and 

means for sending a semaphore failed signal to the 
source of said request for said semaphore if said 
semaphore address is still resident in said CAM to 
thereby save memory bandwidth. 

22. A multiprocessor system capable of supporting 
multiple processors according to claim 16, wherein said 
switch arbitration means for arbitrating for said switch 
network includes means for providing one of the fol- 
lowing arbitration schemes: 

(a) slave priority in which priority is based on the 
slave device and priority is always given to the 
memory ports first and then to said IOU in a round 
robin fashion; and 

(b) master priority in which priority is based on the 
master device and priority is first given to said IOU 
and then to said D- and I-cache. 

23. The multiprocessor system according to claim 22, 20 
wherein I-cache requests are given higher priority than 
IOU and D-cache requests if a prefetch buffer in the 
processor corresponding to the I-cache request will 
soon be empty. 

24. The multiprocessor system according to claim 14, 25 
further comprising: 

means for decrementing a counter each time said IOU 
device is granted use of said memory port; and 

means for lowering priority of said IOU device re- 
quests below said normal priority D-cache and said 30 
normal priority I-cache device requests once said 
counter reaches zero. 

25. The multiprocessor system according to claim 24, 
further comprising: 

means for clearing said counter to a predetermined 35 
value when said IOU device is not granted use of 
said memory port 

26. A method of transferring data in a multiprocessor 
architecture capable of supporting a plurality of micro- 



memory array unit (MAU) via an MAU system bus, 
said MAU system bus including an MAU address bus, 
an MAU data bus and an MAU control signal bus and 
said switch network comprises a switch request bus 
(SW_ REQ), a switch write data bus (SW_WD), and a 
switch read data bus (SW_RD), further comprising the 
steps of: 

temporarily storing an address associated with a re- 
quest to write to said MAU from one of said cache 
and said IOU if said MAU address bus is not then 
available to receive said address; 

temporarily storing write data from said source of 
said request to write to said MAU if said MAU data 
bus is not then available to receive said write data; 

transferring said address associated with said request 
to write to said MAU from said source of said 
request to write to said MAU to said switch request 
bus (SW—REQ) and said write data associated 
therewith to said switch write data bus (SW_RD); 

transferring said address associated with said request 
to write to said MAU from said switch request bus 
(SW—REQ) to said means for temporarily storing 
said address associated with said request to write to 
said MAU; 

transferring said write data from said switch write 
data bus (SW_WD) to said means for temporarily 
storing said write data; and 
transferring said address from said means for tempo- 
rarily storing said address to said MAU address bus 
and said write data from said means for temporarily 
storing said write data to said MAU address and 
write data buses when said MAU address and write 
data buses are available to receive said address and 
said write data. 
28. The method according to claim 26 wherein said 
system comprises means for coupling said MCU to a 
memory array unit (MAU) via an MAU system bus, 
said MAU system bus including an MAU address bus, 
an MAU data bus and an MAU control signal bus and 



processors, each of said microprocessors having a 40 said switch network comprises a switch request bus 



cache, a memory port, an input/output unit (IOU) and 
a memory control unit (MCU), said MCU having a 
switch network, a cache interface circuit, means for 
coupling said cache interface circuit between said cache 
and said switch network, an I/O interface circuit, means 45 
for coupling said I/O interface circuit between said 
IOU and said switch network, a memory port interface 
circuit, means for coupling said memory port interface 
circuit between said memory port and said switch net- 
work, switch arbitration means for arbitrating for said 50 
switch network and port arbitration means for arbitrat- 
ing for said memory port, comprising the steps of: 
transferring to said port arbitration means a request to 
transfer data between one of said cache and said 
IOU and said memory port through said switch 55 
network and said port interface circuit; 
transferring a port available signal from said port 
arbitration means to said switch arbitration means 
when said port interface circuit is free to process 
said request; and 
transferring a switch available signal from said switch 
arbitration means to the source of said request and 
to said port arbitration means when said switch 
network is free to process said request whereby 
data is enabled to be transferred between said one 
of said cache and said IOU and said memory port. 
27. The method according to claim 26 wherein said 
system comprises means for coupling said MCU to a 
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(SW_REQ), a switch write data bus (SW_WD), and a 
switch read data bus (SW_RD), further comprising the 
steps of: 

temporarily storing an address associated with a read 
request to read data from said from one of said 
cache and IOU if said MAU address bus is not then 
available to receive said address; 

temporarily storing said read data from said MAU if 
said switch read data bus (SW_RD) is not then 
available to transfer said read data; 

transferring said address associated with said read 
request from said source of said request to said 
switch request bus (SW_REQ) when said switch 
request bus (SW_REQ) is available; 

transferring said address associated with said read 
request from said switch request bus (SW_REQ) 
to said means for temporarily storing said address 
associated with said read request if said MAU ad- 
dress bus is not then available to receive said ad- 
dress; 

transferring said read data from said MAU data bus to 
said means for temporarily storing said read data 
when said MAU address bus is available to receive 
said address and said switch read bus (SW_RD) is 
not available to transfer said read data; and 

transferring said read data from said means for tem- 
porarily storing said read data to said switch read 
data bus (S W_RD) and from said switch read data 
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bus (SW—RD) to said source of said request when 
said switch read data bus (SW_RD) is available to 
transfer said read data. 

29. The method according to claim 26 wherein said 
switch network in said system comprises a switch re- 5 
quest bus (SW_REQ) a switch write data bus 
(SW_WD), and a switch read data bus (SW_RD) fur- 
ther comprising the steps of: 

transferring a request for an I/O data transfer be- 
tween said cache and said IOU through said switch 10 
network and said I/O interface circuit; 

sending an IOU available signal from said I/O inter- 
face circuit to said switch arbitration means when 
said I/O interface circuit is available to process said 
request for an I/O data transfer; and - 15 

transferring an address associated with said request 
for an I/O data transfer to said I/O interface circuit 
via said switch request bus (SW_REQ) when said 
switch network is available to process said request. 

30. The method according to claim 26 wherein said 20 
switch network in said system comprises a switch re- 
quest bus (SW_REQ) f a switch write data bus 
(SW-WD), and a switch read data bus (SW_RD), 
further comprising the steps of: 

transferring write data from said cache to said I/O 25 
interface circuit via said switch write data bus 
(SW_WD) when said request for an I/O data 
transfer is a write request; and 

transferring read data from said IOU circuit to said 
cache via said switch read data bus (SW_RD) 30 
when said request for an I/O data transfer is a read 
request. 

31. The method according to claim 26 wherein said 
system comprises means for coupling said MCU to a 
memory array unit (MAU) via an MAU system bus, 35 
said MAU system bus including an MAU address bus, 
an MAU data bus and an MAU control signal bus and a 
test and set bypass circuit, said test and set bypass cir- 
cuit having a snoop address generator coupled to said 
MAU address bus for generating snoop addresses corre- 40 
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spending to addresses on said MAU address bus and a 
content addressable memory (CAM), comprising the 
steps of: 

storing the address of a semaphore associated with a 
shared memory region in said CAM; 

comparing said snoop addresses with the contents of 
said CAM on subsequent requests for said sema- 
phore; and 

sending a semaphore failed signal to the source of said 
request for said semaphore if said semaphore ad- 
dress is still resident in said CAM to thereby save 
memory bandwidth. 

32. The method according to claim 31, further com- 
prising the steps of: 

releasing said semaphore and clearing said CAM in 
response to a write to said shared memory region. 

33. The method according to claim 26, including the 
step of arbitrating for said switch network according to 
one of the following arbitration schemes: 

(a) a slave priority scheme based on a requested de- 
vice including the step of assigning priority to the 
memory ports first and then to said IOU in a round 
robin fashion; and 

(b) a master priority based on a requesting device 
including the step of assigning priority to said IOU 
and then to said cache. 

34. The method according to claim 33, further includ- 
ing the step of assigning highest priority to an interven- 
tion request. 

35. The method according to claim 33, wherein one 
of said microprocessors in said multiprocessor architec- 
ture includes an instruction (I) cache and an instruction 
prefetch buffer, and wherein the method further in- 
cludes the steps of: . 

determining whether the prefetch buffer in will soon 

be empty; and if so 
assigning higher priority to I-cache requests than 

IOU and D-cache requests. 
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